127 research outputs found
A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm
Computing problems that handle large amounts of data necessitate the use of
lossless data compression for efficient storage and transmission. We present a
novel lossless universal data compression algorithm that uses parallel
computational units to increase the throughput. The length- input sequence
is partitioned into blocks. Processing each block independently of the
other blocks can accelerate the computation by a factor of , but degrades
the compression quality. Instead, our approach is to first estimate the minimum
description length (MDL) context tree source underlying the entire input, and
then encode each of the blocks in parallel based on the MDL source. With
this two-pass approach, the compression loss incurred by using more parallel
units is insignificant. Our algorithm is work-efficient, i.e., its
computational complexity is . Its redundancy is approximately
bits above Rissanen's lower bound on universal compression
performance, with respect to any context tree source whose maximal depth is at
most . We improve the compression by using different quantizers for
states of the context tree based on the number of symbols corresponding to
those states. Numerical results from a prototype implementation suggest that
our algorithm offers a better trade-off between compression and throughput than
competing universal data compression algorithms.Comment: Accepted to Journal of Selected Topics in Signal Processing special
issue on Signal Processing for Big Data (expected publication date June
2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note:
substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a
typ
A Parallel Two-Pass MDL Context Tree Algorithm for Universal Source Coding
We present a novel lossless universal source coding algorithm that uses
parallel computational units to increase the throughput. The length- input
sequence is partitioned into blocks. Processing each block independently of
the other blocks can accelerate the computation by a factor of , but
degrades the compression quality. Instead, our approach is to first estimate
the minimum description length (MDL) source underlying the entire input, and
then encode each of the blocks in parallel based on the MDL source. With
this two-pass approach, the compression loss incurred by using more parallel
units is insignificant. Our algorithm is work-efficient, i.e., its
computational complexity is . Its redundancy is approximately
bits above Rissanen's lower bound on universal coding performance,
with respect to any tree source whose maximal depth is at most
Empirical Bayes and Full Bayes for Signal Estimation
We consider signals that follow a parametric distribution where the parameter
values are unknown. To estimate such signals from noisy measurements in scalar
channels, we study the empirical performance of an empirical Bayes (EB)
approach and a full Bayes (FB) approach. We then apply EB and FB to solve
compressed sensing (CS) signal estimation problems by successively denoising a
scalar Gaussian channel within an approximate message passing (AMP) framework.
Our numerical results show that FB achieves better performance than EB in
scalar channel denoising problems when the signal dimension is small. In the CS
setting, the signal dimension must be large enough for AMP to work well; for
large signal dimensions, AMP has similar performance with FB and EB.Comment: This work was presented at the Information Theory and Application
workshop (ITA), San Diego, CA, Feb. 201
A Study on the Impact of Locality in the Decoding of Binary Cyclic Codes
In this paper, we study the impact of locality on the decoding of binary
cyclic codes under two approaches, namely ordered statistics decoding (OSD) and
trellis decoding. Given a binary cyclic code having locality or availability,
we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise
ratio, for a given reliability and essentially the same level of decoder
complexity. With regard to trellis decoding, we show that careful introduction
of locality results in the creation of cyclic subcodes having lower maximum
state complexity. We also present a simple upper-bounding technique on the
state complexity profile, based on the zeros of the code. Finally, it is shown
how the decoding speed can be significantly increased in the presence of
locality, in the moderate-to-high SNR regime, by making use of a quick-look
decoder that often returns the ML codeword.Comment: Extended version of a paper submitted to ISIT 201
Rate-Optimal Streaming Codes for Channels with Burst and Isolated Erasures
Recovery of data packets from packet erasures in a timely manner is critical
for many streaming applications. An early paper by Martinian and Sundberg
introduced a framework for streaming codes and designed rate-optimal codes that
permit delay-constrained recovery from an erasure burst of length up to . A
recent work by Badr et al. extended this result and introduced a sliding-window
channel model . Under this model, in a sliding-window of
width , one of the following erasure patterns are possible (i) a burst of
length at most or (ii) at most (possibly non-contiguous) arbitrary
erasures. Badr et al. obtained a rate upper bound for streaming codes that can
recover with a time delay , from any erasure patterns permissible under the
model. However, constructions matching the bound were
absent, except for a few parameter sets. In this paper, we present an explicit
family of codes that achieves the rate upper bound for all feasible parameters
, , and .Comment: shorter version submitted to ISIT 201
Sequential Gradient Coding For Straggler Mitigation
In distributed computing, slower nodes (stragglers) usually become a
bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient
technique that uses principles of error-correcting codes to distribute gradient
computation in the presence of stragglers. In this paper, we consider the
distributed computation of a sequence of gradients ,
where processing of each gradient starts in round- and finishes by
round-. Here denotes a delay parameter. For the GC scheme,
coding is only across computing nodes and this results in a solution where
. On the other hand, having allows for designing schemes which
exploit the temporal dimension as well. In this work, we propose two schemes
that demonstrate improved performance compared to GC. Our first scheme combines
GC with selective repetition of previously unfinished tasks and achieves
improved straggler mitigation. In our second scheme, which constitutes our main
contribution, we apply GC to a subset of the tasks and repetition for the
remainder of the tasks. We then multiplex these two classes of tasks across
workers and rounds in an adaptive manner, based on past straggler patterns.
Using theoretical analysis, we demonstrate that our second scheme achieves
significant reduction in the computational load. In our experiments, we study a
practical setting of concurrently training multiple neural networks over an AWS
Lambda cluster involving 256 worker nodes, where our framework naturally
applies. We demonstrate that the latter scheme can yield a 16\% improvement in
runtime over the baseline GC scheme, in the presence of naturally occurring,
non-simulated stragglers
- …